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Abstract —Mining discriminative subgraph patterns from 
graph data has attracted great interest in recent years. It has a 
wide variety of applications in disease diagnosis, neuroimaging, 
etc. Most research on subgraph mining focuses on the graph 
representation alone. However, in many real-world applications, 
the side information is available along with the graph data. For 
example, for neurological disorder identification, in addition to 
the brain networks derived from neuroimaging data, hundreds of 
clinical, immunologic, serologic and cognitive measures may also 
be documented for each subject. These measures compose mul¬ 
tiple side views encoding a tremendous amount of supplemental 
information for diagnostic purposes, yet are often ignored. In this 
paper, we study the problem of discriminative subgraph selection 
using multiple side views and propose a novel solution to find 
an optimal set of subgraph features for graph classification by 
exploring a plurality of side views. We derive a feature evaluation 
criterion, named gSide, to estimate the usefulness of subgraph 
patterns based upon side views. Then we develop a branch-and- 
bound algorithm, called gMSV, to efficiently search for optimal 
subgraph features by integrating the subgraph mining process 
and the procedure of discriminative feature selection. Empirical 
studies on graph classification tasks for neurological disorders 
using brain networks demonstrate that subgraph patterns se¬ 
lected by the multi-side-view guided subgraph selection approach 
can effectively boost graph classification performances and are 
relevant to disease diagnosis. 

Index Terms —subgraph pattern, graph mining, side informa¬ 
tion, brain network. 

I. Introduction 

Recent years have witnessed an increasing amount of data 
in the form of graph representations, which involve complex 
structures, e.g., brain networks, social networks. These data are 
inherently represented as a set of nodes and links, instead of 
feature vectors as traditional data. For example, brain networks 
are composed of brain regions as the nodes, e.g., insula, hip¬ 
pocampus, thalamus, and functional/structural connectivities 
between the brain regions as the links. The linkage structure 
in these brain networks can encode tremendous information 
about the mental health of the human subjects. For example, in 
the brain networks derived from functional magnetic resonance 
imaging (fMRI), functional connections/links can encode the 
correlations between the functional activities of brain regions. 
While structural links in diffusion tensor imaging (DTI) brain 
networks can capture the number of neural fibers connecting 
different brain regions. The complex structures and the lack of 
vector representations within these graph data raise a challenge 


for data mining. An effective model for mining the graph data 
should be able to extract a set of subgraph patterns for further 
analysis. Motivated by such challenges, graph mining research 
problems, in particular graph classification, have received 
considerable attention in the last decade. 

The graph classification problem has been studied exten¬ 
sively. Conventional approaches focus on mining discrimina¬ 
tive subgraphs from graph view alone. This is usually feasible 
for applications like molecular graph analysis, where a large 
set of graph instances with labels are available. For brain net¬ 
work analysis, however, usually we only have a small number 
of graph instances, ranging from 30 to 100 brain networks 
d). In these applications, the information from the graph 
view alone are usually not sufficient for mining important 
subgraphs. We notice that, fortunately, the side information is 
available along with the graph data for neurological disorder 
identification. For example, in neurological studies, hundreds 
of clinical, immunologic, serologic and cognitive measures 
may be available for each subject li), m, in addition to brain 
networks derived from the neuroimaging data, as shown in 
Figure These measures compose multiple side views which 
contain a tremendous amount of supplemental information 
for diagnostic purposes. It is desirable to extract valuable 
information from a plurality of side views to guide the process 
of subgraph mining in brain networks. 

Despite its value and significance, the feature selection 
problem for graph data using auxiliary views has not been 
studied in this context so far. There are two major difficulties 
in learning from multiple side views for graph classification, 
as follows: 

The primary view in graph representation: Graph data nat¬ 
urally composes the primary view for graph mining problems, 
from which we want to select discriminative subgraph patterns 
for graph classification. However, it raises a challenge for data 
mining with the complex structures and the lack of vector 
representations. Conventional feature selection approaches in 
vector spaces usually assume that a set of features are given 
before conducting feature selection. In the context of graph 
data, however, subgraph features are embedded within the 
graph structures and usually it is not feasible to enumerate 
the full set of subgraph features for a graph dataset before 
feature selection. Actually, the number of subgraph features 
grows exponentially with the size of graphs. 




Side view 3: 
seroiogic measures 


Side view 4: 
ciinicai measures 


Fig. 1. An example of multiple side views associated with brain networks 
in medical studies. 


The side views in vector representations: In many applica¬ 
tions, side information is available along with the graph data 
and usually exists in the form of vector representations. That 
is to say, an instance is represented by a graph and additional 
vector-based features at the same time. It introduces us to 
the problem of how to leverage the relationship between the 
primary graph view and a plurality of side views, and how 
to facilitate the subgraph mining procedure by exploring the 
vector-based auxiliary views. For example, in brain networks, 
discriminative subgraph patterns for neurological disorders 
indicate brain injuries associated with particular regions. Such 
changes can potentially express in other medical tests of the 
subject, e.g., clinical, immunologic, serologic and cognitive 
measures. Thus, it would be desirable to select subgraph 
features that are consistent with these side views. 


Figure |2(a)| illustrates the process of selecting subgraph 
patterns in conventional graph classification approaches. Ob¬ 
viously, the valuable information embedded in side views is 
not fully leveraged in feature selection process. Most subgraph 
mining approaches focus on the drug discovery problem which 
have access to a great amount of graph data for chemical 
compounds. For neurological disorder identification, however, 
there are usually limited subjects with a small sample size 
of brain networks available. Therefore, it is critical to learn 
knowledge from other possible sources. We notice that transfer 
learning can borrow supervision knowledge from the source 
domain to help the learning on the target domain, e.g., 
finding a good feature representation m, mapping relational 
knowledge 1^ . 1^ . and learning across graph database 1^ . 
However, as far as we know, they do not look at transferring 



(a) Conventional methods treat side views and subgraph patterns separately and 
may only concatenate them in the hnal step for graph classification. 



(b) Our method uses side views as guidance for the process of selecting 
subgraph patterns. 


Fig. 2. Two strategies of leveraging side views in feature selection process 
for graph classification. 


complementary information from vector-based side views to 
graph database whose instances are complex structural graphs. 

To solve the above problems, in this paper, we introduce a 
novel framework for discriminative subgraph selection using 
multiple side views. Our framework is illustrated in Fig¬ 
ure |2(b)| In contrast to existing subgraph mining approaches 
that focus on a single view of the graph representation, our 
method can explore multiple vector-based side views to find an 
optimal set of subgraph features for graph classification. We 
first verify side information consistency via statistical hypoth¬ 
esis testing. Based on auxiliary views and the available label 
information, we design an evaluation criterion for subgraph 
features, named gSide. By deriving a lower bound, we develop 
a branch-and-bound algorithm, called gMSV, to efficiently 
search for optimal subgraph features with pruning, thereby 
avoiding exhaustive enumeration of all subgraph features. In 
order to evaluate our proposed model, we conduct experiments 
on graph classification tasks for neurological disorders, using 
fMRI and DTI brain networks. The experiments demonstrate 
that our subgraph selection approach using multiple side 
views can effectively boost graph classification performances. 
Moreover, we show that gMSV is more efficient by pruning 
the subgraph search space via gSide. 

II. Problem Formulation 

Before presenting the subgraph feature selection model, we 
first introduce the notations that will be used throughout this 
paper. Let D = {Gi, • • • , G^} denote the graph dataset, which 
consists of n graph objects. The graphs within V are labeled 
by where i/i G { —1,+1} denotes the binary 

class label of G^. 

Definition 1 (Side view): A side view is a set of vector- 
based features = [zi, • • • , ^ associated with each graph 

object Gi, where d is the dimensionality of this view. A side 
view is denoted as Z = {zi, • • • , z^}. 






































We assume that there are multiple side views 
available for the graph dataset V, where 
V is the number of side views. We employ kernels on 
such that represents the similarity between Gi and 
Gj from the perspective of the p-th view. The RBF kernel 
is used as the default kernel in this paper, unless otherwise 
specified: 


(p) 

i^ij = exp 


Ap) Zp)\ 




( 1 ) 


Definition 2 (Graph): A graph is represented as G = 
(V^E), where V = {^i,*** is the set of vertices, 

T; C 1/ X 1/ is the set of edges. 

Definition 3 (Subgraph): Let G' = {V',E') and G = 
(V^E) be two graphs. G' is a subgraph of G (denoted as 
G' C G) iff V' CV and E' C E. If G' is a subgraph of G, 
then G is supergraph of G'. 

In this paper, we adopt the idea of subgraph-based graph 
classification approaches, which assume that each graph object 
Gj is represented as a binary vector Xj = [xij, • • • ^Xmj]^ 
associated with the full set of subgraph patterns {pi, • • • , Pm} 
for the graph dataset {Gi, • • • , G^}. Here Xij G {0,1} is the 
binary feature of Gj corresponding to the subgraph pattern gi, 
and Xij = 1 iff pi is a subgraph of Gj (pi C Gj), otherwise 
Xij = 0. Let X = [xij]^^^ denote the matrix consisting of 
binary feature vectors using S to represent the graph dataset 
P. X = [xi, ■ ■ ■ , x„] = [fi, ■ ■ ■ , f^]T e {0,The full 
set S is usually too large to be enumerated. There is usually 
only a subset of subgraph patterns T E S relevant to the task 
of graph classification. We briefiy summarize the notations 
used in this paper in Table |l| 

The key issue of discriminative subgraph selection using 
multiple side views is how to find an optimal set of subgraph 
patterns for graph classification by exploring the auxiliary 
views. This is non-trivial due to the following problems: 

• How to leverage the valuable information embedded in 
multiple side views to evaluate the usefulness of a set of 
subgraph patterns? 

• How to efficiently search for the optimal subgraph pat¬ 
terns without exhaustive enumeration in the primary 
graph space? 

In the following sections, we will first introduce the op¬ 
timization framework for selecting discriminative subgraph 
features using multiple side views. Next, we will describe our 
subgraph mining strategy using the evaluation criterion derived 
from the optimization solution. 


HI. Data Analysis 

A motivation for this work is that the side information could 
be strongly correlated with the health state of a subject. Before 
proceeding, we first introduce real-world data used in this 
work and investigate whether the available information from 
side views has any potential impact on neurological disorder 
identification. 


A. Data Collections 

In this paper, we study the real-world datasets collected 
from the Chicago Early HIV Infection Study at Northwestern 
University 1^ . The clinical cohort includes 56 HIV (positive) 
and 21 seronegative controls (negative). The datasets contain 
functional magnetic resonance imaging (fMRI) and diffusion 
tensor imaging (DTI) for each subject, from which brain 
networks can be constructed, respectively. 

For fMRI data, we used DPARSF toolbo}|^ to extract a se¬ 
quence of responds from each of the 116 anatomical volumes 
of interest (AVOI), where each AVOI represents a different 
brain region. The correlations of brain activities among differ¬ 
ent brain regions are computed. Positive correlations are used 
as links among brain regions. For details, functional images 
were realigned to the first volume, slice timing corrected, and 
normalized to the MNI template and spatially smoothed with 
an 8-mm Gaussian kernel. The linear trend of time series and 
temporally band-pass filtering (0.01-0.08 Hz) were removed. 
Before the correlation analysis, several sources of spurious 
variance were also removed from the data through linear 
regression: (i) six parameters obtained by rigid body correction 
of head motion, (ii) the whole-brain signal averaged over a 
fixed region in atlas space, (iii) signal from a ventricular region 
of interest, and (iv) signal from a region centered in the white 
matter. Each brain is represented as a graph with 90 nodes 
corresponding to 90 cerebral regions, excluding 26 cerebellar 
regions. 

For DTI data, we used FSL toolbo^ to extract the brain 
networks. The processing pipeline consists of the following 
steps: (i) correct the distortions induced by eddy currents in the 
gradient coils and use affine registration to a reference volume 
for head motion, (ii) delete non-brain tissue from the image of 
the whole head Ga, El, (iii) fit the diffusion tensor model at 
each voxel, (iv) build up distributions on diffusion parameters 
at each voxel, and (v) repetitively sample from the distributions 
of voxel-wise principal diffusion directions. As with the fMRI 
data, the DTI images were parcellated into 90 regions (45 for 
each hemisphere) by propagating the Automated Anatomical 
Labeling (AAL) to each image 1^ . Min-max normalization 
was applied on link weights. 

In addition, for each subject, hundreds of clinical, imaging, 
immunologic, serologic and cognitive measures were docu¬ 
mented. Seven groups of measurements were investigated in 
our datasets, including neuropsychological tests, flow cytome¬ 
try, plasma luminex, freesurfer, overall brain micro structure, 
localized brain micro structure, brain volumetry. Each group 
can be regarded as a distinct view that partially reflects subject 
status, and measurements from different medical examina¬ 
tions can provide complementary information. Moreover, we 
preprocessed the features by min-max normalization before 
employing the RBF kernel on each view. 


^ http://rfmri.org/DPARSF 
^ http: //f si. fmrib. ox. ac.uk/fsl/fslwiki 



TABLE I 

Important notations. 


Symbol 


Definition and Description 

M 


cardinality of a set 

. 


norm of a vector 

V = {Gi,-- 

• 5 Gn} 

given graph dataset, Gi denotes the Pth graph in the dataset 

y = [yi,"- 

5 yn]~^ 

class label vector for graphs in V, yi G { — 1, +1} 

<5 = {3i> - ■ • 

? 9m} 

set of all subgraph patterns in the graph dataset V 

fi= [/a,-- 

■ JinV 

binary vector for subgraph pattern gi, fij = 1 iff gi Q Gj, otherwise fij = 0 

Xj = [xij,- 

■ ■ 5 ^mj ] 

binary vector for Gj using subgraph patterns in S, Xij = 1 iff gi C Gj, otherwise xij = 0 

X = 

matrix of all binary vectors in the dataset, AT = [xi, • • • , x^] = [fi, • • • , fm]^ e {0, 

T 


set of selected subgraph patterns, T Q S 

Xr e {0,1} 

mXm 

diagonal matrix indicating which subgraph patterns are selected from S into T 

min sup 


minimum frequency threshold; frequent subgraphs are contained by at least min_supx \V\ graphs 

k 


number of subgraph patterns to be selected 

\(p) 


weight of the p-th side view (default: 1) 

k(p) 


kernel function on the p-th side view (default: RBE kernel) 


B. Verifying Side Information Consistency 

We study the potential impact of side information on select¬ 
ing subgraph patterns via statistical hypothesis testing. Side 
information consistency suggests that the similarity of side 
view features between instances with the same label should 
have higher probability to be larger than that with different 
labels. We use hypothesis testing to validate whether this 
statement holds in the fMRI and DTI datasets. 

(v) 

For each side view, we first construct two vectors ^ and 

iv) 

sq with an equal number of elements, sampled from the sets 
and respectively: 

= = ( 2 ) 

=-^} (3) 

Then, we form a two-sample one-tail t-test to validate the 
existence of side information consistency. We test whether 
there is sufficient evidence to support the hypothesis that the 
similarity score in is larger than that in The null 

(v) (v) ^ 

hypothesis is Hq : ^ < 0, and the alternative 

hypothesis is iTi : > 0, where and 

represent the sample means of similarity scores in the two 
groups, respectively. 


TABLE II 

Hypothesis testing results (p-values) to verily side 

INEORMATION CONSISTENCY. 


Side views 

fMRI dataset 

DTI dataset 

neuropsychological tests 

1.3220e-20 

3.6015e-12 

flow cytometry 

5.9497e-57 

5.0346e-75 

plasma luminex 

9.8102e-06 

7.6090e-06 

freesurfer 

2.9823e-06 

1.5116e-03 

overall brain micro structure 

1.0403e-02 

8.1027e-03 

localized brain microstructure 

3.1108e-04 

5.7040e-04 

brain volumetry 

2.0024e-04 

1.2660e-02 


The t-test results, p-values, are summarized in Table [II| The 
results show that there is strong evidence, with significance 
level a = 0.05, to reject the null hypothesis on the two 


datasets. In other words, we validate the existence of side 
information consistency in neurological disorder identification, 
thereby paving the way for our next study of leveraging 
multiple side views for discriminative subgraph selection. 

IV. Multi-Side-View Discriminative 
Subgraph Selection 

In this section, we address the first problem discussed in 
Section l^by formulating the discriminative subgraph selection 
problem as a general optimization framework as follows: 

T* = argmin J’(T) s.t. \T\ <k (4) 

res 

where | • | denotes the cardinality and k is the maximum number 
of feature selected. T{T) is the evaluation criterion to estimate 
the score (can be the lower the better in this paper) of a 
subset of subgraph patterns T. T* denotes the optimal set 
of subgraph patterns C S. 

A. Exploring Multiple Side Views: gSide 

Following the observations in Section |III-B| that the side 
view information is clearly correlated with the prespecified 
label information, we assume that the set of optimal sub¬ 
graph patterns should have the following properties. The 
similarity/distance between instances in the space of subgraph 
features should be consistent with that in the space of a side 
view. That is to say, if two instances are similar in the space of 
the p-th view (i.e., a high value), they should also be close 
to each other in the space of subgraph features (i.e., a small 
distance between subgraph feature vectors). On the other hand, 
if two instances are dissimilar in the space of the p-th view 
(i.e., a low nlj value), they should be far away from each 
other in the space of subgraph features (i.e., a large distance 
between subgraph feature vectors). Therefore, our objective 
function could be to minimize the distance between subgraph 
features of each pair of similar instances in each side view, 
and maximize the distance between dissimilar instances. This 
idea is formulated as follows: 

argmin ^ ^ ^ || 2 ©if (5) 

rc5 2^ 










where X 7 - is a diagonal matrix indicating which subgraph 
features are selected into T from 5, = 1 iff 9i ^ T, 

otherwise = 0. The parameters >0 are employed 

to control the contributions from each view. 


= 


\n^p)\ 

1 

|£(p)| 




( 6 ) 


where = {(*)J')I 


(p) 


< 

(P) 


and is the mean value of i.e., ^ Yl7,j=i 
This normalization is to balance the effect of similar instances 
and dissimilar instances. 

Intuitively, Eq. <0 will minimize the distance between 
subgraph features of similar instance-pairs with 
while maximize the distance between dissimilar instance-pairs 
in each view. In this way, the side view 
information is effectively used to guide the process of discrim¬ 


with < 

f'j 


inative subgraph selection. The fact verified in Section |III-B 
that the side view information is clearly correlated with the 
prespecified label information can be very useful, especially 
in the semi-supervised setting. 

With prespecified information for labeled graphs, we further 
consider that the optimal set of subgraph patterns should 
satisfy the following constraints: labeled graphs in the same 
class should be close to each other; labeled graphs in different 
classes should be far away from each other. Intuitively, these 
constraints tend to select the most discriminative subgraph 
patterns based on the graph labels. Such an idea has been 
well explored in the context of dimensionality reduction and 
feature selection 11, EB. 

The constraints above can be mathematically formulated as 
minimizing the loss function: 


1 

argmin- V 


\\Tr^i - Xr^j 


\2^ij 


(7) 


where 


91 2 j — 


0 


{ij) eM 

(hJ)^C 

otherwise 


( 8 ) 


and A4 = {(i,j)lyiyj = 1} denotes the set of pairwise 
constraints between graphs with the same label, and C = 
{(hj)iyiyj = — 1 } denotes the set of pairwise constraints 
between graphs with different labels. 

By defining matrix ^ G as 

^ij=f2ij + ^A(P)0if (9) 

p=l 

we can combine and rewrite the function in Eq. ^ and Eq. (|7]) 


as 

-j n n 

= 9 E E ll^rxi - Xrx,- 

i = l j = l 

= tr(I^X{D - $)X^Xr) (10) 

= tril^XLX'^ Ir) 

= 

where tr(-) is the trace of a matrix, is a diagonal matrix 
whose entries are column sums of i.e.. Da = and 

L = ^ is a Laplacian matrix. 

Definition 4 (gSide): Let V = {Gi, • • • , Gn} denote a 
graph dataset with multiple side views. Suppose is a matrix 
defined as Eq. 0, and L is a Laplacian matrix defined as 
L = D — ^, where D is a diagonal matrix, Du = ^ij. We 
define an evaluation criterion q, called gSide, for a subgraph 
pattern gi as 

q{gi)=f^Lfi (11) 

where f^ = [/ii, • • • , /in]^ ^ {0? is the indicator vector 

for subgraph pattern gi, fij = 1 iff gi C Gj, otherwise fij = 
0. Since the Laplacian matrix L is positive semi-definite, for 
any subgraph pattern gi, q{gi) > 0. 

Based on gSide as defined above, the optimization problem 
in Eq. 0 can be written as 

T* = argmin V q{gi) s.t. \T\ < k (12) 

The optimal solution to the problem in Eq. can be 
found by using gSide to conduct feature selection on a set 
of subgraph patterns in S. Suppose the gSide values for all 
subgraph patterns are denoted as q{gi) < ••• < q{gm) in 
sorted order, then the optimal solution to the optimization 
problem in Eq. is 

T*= U {gi} (13) 

1 = 1 

B. Searching with A Lower Bound: gMSV 

Now we address the second problem discussed in Section [n| 
and propose an efficient method to find the optimal set of 
subgraph patterns from a graph dataset with multiple side 
views. 

A straightforward solution to the goal of finding an optimal 
feature set is the exhaustive enumeration, i.e., we could first 
enumerate all subgraph patterns from a graph dataset, and 
then calculate the gSide values for all subgraph patterns. In 
the context of graph data, however, it is usually not feasible 
to enumerate the full set of subgraph patterns before feature 
selection. Actually, the number of subgraph patterns grows 
exponentially with the size of graphs. Inspired by recent 
advances in graph classification approaches la, 03, Eo), 
1^ . which nest their evaluation criteria into the subgraph 
mining process and develop constraints to prune the search 
space, we adopt a similar approach by deriving a different 
constraint based upon gSide. 





By adopting the gSpan algorithm proposed by Yan and Han 
Oil, we can enumerate all the subgraph patterns for a graph 
dataset in a canonical search space. In order to prune the 
subgraph search space, we now derive a lower bound of the 
gSide value: 

Theorem 1: Given any two subgraph patterns G 5, 

Qj is a supergraph of Qi, i.e., gi C gj. The gSide value of gj 
is bounded by q{gi), i.e., q{gj) > q{gi)- q{gi) is defined as 

q{gi) ^ Lti (14) 

where the matrix L is defined as Lpq = min(0, Lpq). 

Proof. According to Definition 

q{gj)={jLfj= (15) 

PiQ.-Gp,GqEiG{gj ) 

where Q{gj) = {Gk\gj C G/c,l < /c < n}. Since gi C gj, 
according to anti-monotonic property, we have G{gj) G G{gi). 
Also Lpq = min(0,I/pg), we have Lpq < Lpq and Lpq < 0. 
Therefore, 

~ ^pq — ^pq 

p,q-Gp,Gq^Q{gj) Piq-Gp,Gq^G{gj) 

^ Lpq = q{9i) 

p,q:Gp,Gq£g(gi) 

Thus, for any g^ C g^ q{gj) > q{gi). □ 


Algorithm 1 The Proposed Method: gMSV 

Input: D, min_sup, k, { }p=i 

Output: T: Set of optimal subgraph patterns 
1: T=^,0 = lnf 

2 : while unexplored nodes in the DFS code tree 7 ^ 0 do 
3: g = currently explored node in the DFS code tree 

4: if freq{g) > min_sup then 

5: if \T\ < k or q{g) < 0 then 

6 : r=TU{g} 

7: if \T\ > k then 

8 : gmax = a.vgma.Xg,(,pq{g') 

9: T =T/{gmax} 

10: end if 

11 : 0 = maKg'er<l{9') 

12 : end if 

13: if q(g) < 0 then 

14: Depth-first search the subtree rooted from g 

15: end if 

16: end if 

17: end while 

18: return T 


We can now nest the lower bound into the subgraph mining 
steps in gSpan to efficiently prune the DFS code tree. During 
the depth-first search through the DFS code tree, we always 
maintain the currently top-k best subgraph patterns according 
to gSide and the temporally suboptimal gSide value (denoted 
by 0) among all the gSide values calculated before. If q{gi) > 
0, the gSide value of any supergraph gj of gi should be no less 


than q{gi) according to Theorem Q{9j) > q{9i) > 0. 

Thus, we can safely prune the subtree rooted from gi in the 
search space. If q{gi) < 0, we cannot prune this subtree since 
there might exist a supergraph gj of gi such that q{gj) < 0. 
As long as a subgraph gi can improve the gSide values of 
any subgraphs in T, it is added into T and the least best 
subgraph is removed from T. Then we recursively search for 
the next subgraph in the DFS code tree. The branch-and-bound 
algorithm gMSV is summarized in Algorithmic 

V. Experiments 

In order to evaluate the performance of the proposed solu¬ 
tion to the problem of feature selection for graph classification 
using multiple side views, we tested our algorithm on brain 
network datasets derived from neuroimaging, as introduced in 
Section IIII-AI 


A. Experimental Setup 

To the best of our knowledge, this paper is the first work 
on leveraging side information in feature selection problem 
for graph classification. In order to evaluate the performance 
of the proposed method, we compare our method with other 
methods using different statistical measures and discriminative 
score functions. For all the compared methods, gSpan l37l is 
used as the underlying searching strategy. Note that although 
alternative algorithms are available El, Ga, uni, the search 
step efficiency is not the focus of this paper. The compared 
methods are summarized as follows: 


gMSV: The proposed discriminative subgraph selection 
method using multiple side views. Following the observa¬ 
tion in Section |III-B that side information consistency is 
verified to be significant in all the side views, the parame¬ 
ters in gMSV are simply set to = • • • = A^^^ = 1 for 
experimental purposes. In the case where some side views 
are suspect to be redundant, we can adopt the alternative 
optimization strategy to iteratively select discriminative 
subgraph patterns and update view weights. 
gSSC: A semi-supervised feature selection method for 
graph classification based upon both labeled and unla¬ 
beled graphs. The parameters in gSSC are set to a = 
P = 1 unless otherwise specified 1201 . 

Discriminative Subgraphs (Conf, Ratio, Gtest, HSIC): 
Supervised feature selection methods for graph classifi¬ 
cation based upon confidence ifTTl . frequency ratio ifTSl . 
ini, ca, G-test score 1361 and HSIC 1191 . respectively. 
The top-k discriminative subgraph features are selected 
in terms of different discrimination criteria. 

Frequent Subgraphs (Freq): In this approach, the eval¬ 
uation criterion for subgraph feature selection is based 
upon frequency. The top-k frequent subgraph features are 
selected. 


We append the side view data to the subgraph-based graph 
representations computed by the above algorithms before feed¬ 
ing the concatenated feature vectors to the classifier. Another 
baseline that only uses side view data is denoted as MSV. 











For a fair comparison, we used LibSVM |0 with linear 
kernel as the base classifier for all the compared methods. In 
the experiments, 3-fold cross validations were performed on 
balanced datasets. To get the binary links, we performed sim¬ 
ple thresholding over the weights of the links. The threshold 
for fMRI and DTI datasets was 0.9 and 0.3, respectively. 

B. Performance on Graph Classification 

The experimental results on fMRI and DTI datasets are 
shown in Figure and Figure respectively. The average 
performances with different number of features of each method 
are reported. Classification accuracy is used as the evaluation 
metric. 

In Figureour method gMSV can achieve the classification 
accuracy as high as 97.16% on the fMRI dataset, which is 
significantly better than the union of other subgraph-based 
features and side view features. The black solid line denotes 
the method MSV, the simplest baseline that uses only side 
view data. Conf and Ratio can do slightly better than MSV. 
Freq adopts an unsupervised process for selecting subgraph 
patterns, resulting in a comparable performance with MSV, 
indicating that there is no additional information from the 
selected subgraphs. Other methods that use different discrimi¬ 
nation scores without leveraging the guidance from side views 
perform even worse than MSV in graph classification, because 
they evaluate the usefulness of subgraph patterns solely based 
on the limited label information from a small sample size 
of brain networks. The selected subgraph patterns can poten¬ 
tially be redundant or irrelevant, thereby compromising the 
effects of side view data. Importantly, gMSV outperforms the 
semi-supervised approach gSSC which explores the unlabeled 
graphs based on the separability property. This indicates that 
rather than simply considering that unlabeled graphs should 
be separated from each other, it would be better to regularize 
such separability/closeness to be consistent with the available 
side views. 

Similar observations can be found in Figure]^ where gMSV 
outperforms other baselines by achieving a good performance 
as high as 97.33% accuracy on the DTI dataset. We notice that 
only gMSV is able to do better than MSV by adding com¬ 
plementary subgraph-based features to the side view features. 
Moreover, the performances of other schemes are not consis¬ 
tent over the two datasets. The 2nd and 3rd best schemes, 
Conf and Ratio, for fMRI do not perform as well for DTI. 
These results support our premise that exploring a plurality of 
side views can boost the performance of graph classification, 
and the gSide evaluation criterion in gMSV can find more 
informative subgraph patterns for graph classification than 
subgraphs based on frequency or other discrimination scores. 

C. Time and Space Complexity 

Next, we evaluate the effectiveness of pruning the subgraph 
search space by adopting the lower bound of gSide in gMSV. 
In this section, we compare the runtime performance of two 
implementation versions of gMSV: the pruning gMSV uses the 
lower bound of gSide to prune the search space of subgraph 



Fig. 3. Classification performance on the fMRI dataset with different number 
of features. 



Fig. 4. Classification performance on the DTI dataset with different number 
of features. 

enumerations, as shown in Algorithmthe unpruning gMSV 
denotes the method without pruning in the subgraph mining 
process, e.g., deleting the line 13 in Algorithm We test 
both approaches and recorded the average CPU time used and 
the average number of subgraph patterns explored during the 
procedure of subgraph mining and feature selection. 

The comparisons with respect to the time complexity and 
the space complexity are shown in Figure and Figure 
respectively. On both datasets, the unpruning gMSV needs 
to explore exponentially larger subgraph search space as we 
decrease the min_sup value in the subgraph mining process. 
When the min_sup value is too low, the subgraph enumeration 
step in the unpruning gMSV can run out of the memory. 
However, the pruning gMSV is still effective and efficient 
when the min_sup value goes to very low, because its running 
time and space requirement do not increase as much as the 
unpruning gMSV by reducing the subgraph search space via 
the lower bound of gSide. 




















(b) DTI dataset 


Fig. 5. Average CPU time for pruning versus unpruning with varying 
min_sup. 




Fig. 6. Average number of subgraph patterns explored in the mining 
procedure for pruning versus unpruning with varying min_sup. 


The focus of this paper is to investigate side information 
consistency and explore multiple side views in discriminative 
subgraph selection. As potential alternatives to the gSpan- 
based branch-and-bound algorithm, we could employ other 
more sophisticated searching strategies with our proposed 
multi-side-view evaluation criterion, gSide. For example, we 
can replace with gSide the G-test score in LEAP |[3^ or 
the log ratio in COM {T6\ and GAIA ITtI, etc. However, as 
shown in Figure and Figure our proposed solution with 
pruning, gMSV, can survive at min_sup = 4%; considering 
the limited number of subjects in medical experiments as 
introduced in Section |III-A| gMSV is efficient enough for 
neurological disorder identification where subgraph patterns 
with too few supported graphs are not desired. 


D. Effects of Side Views 


In this section, we first investigate the different contributions 
from different side views. Table [nl| shows the performance of 
gMSV on the fMRI dataset by considering only one side view 
each time. In general, the best performance is achieved by 
simultaneously exploring all the side views. Specifically, we 
observe that the side view flow cytometry can independently 
provide the most informative side information for selecting 
discriminative subgraph patterns on the fMRI brain networks, 
which might imply that HIV brain injuries in the sense of 
functional connectivity are most likely to express in measure¬ 
ments from this side view. It is consistent with our finding 
in Section |III-B that the side view flow cytometry is the most 
significantly correlated with the prespecified label information. 
Results on the DTI dataset are shown in Table [Ivl 


TABLE III 

Average classieication pereormances oe gMSV on the eMRI 

DATASET WITH DIEEERENT SINGLE SIDE VIEWS. 


Side views 

Ace. 

Free. 

Rec. 

FI 

neuropsychological tests 

0.743 

0.851 

0.679 

0.734 

flow cytometry 

0.887 

0.919 

0.872 

0.892 

plasma luminex 

0.715 

0.769 

0.682 

0.710 

freesurfer 

0.786 

0.851 

0.737 

0.785 

overall brain microstructure 

0.672 

0.824 

0.500 

0.618 

localized brain microstructure 

0.628 

0.686 

0.605 

0.637 

brain volumetry 

0.701 

0.739 

0.737 

0.731 

All side views 

0.972 

1.000 

0.949 

0.973 


TABLE IV 

Average classieication pereormances oe gMSV on the DTI 

DATASET WITH DIEEERENT SINGLE SIDE VIEWS. 


Side views 

Ace. 

Free. 

Rec. 

FI 

neuropsychological tests 

0.616 

0.630 

0.705 

0.662 

flow cytometry 

0.815 

0.847 

0.808 

0.822 

plasma luminex 

0.736 

0.801 

0.705 

0.744 

freesurfer 

0.631 

0.664 

0.632 

0.644 

overall brain microstructure 

0.604 

0.626 

0.679 

0.647 

localized brain micro structure 

0.723 

0.717 

0.775 

0.741 

brain volumetry 

0.605 

0.616 

0.679 

0.644 

All side views 

0.973 

1.000 

0.951 

0.974 


E. Eeature Evaluation 

Figure [7] and Figure display the most discriminative 
subgraph patterns selected by gMSV from the fMRI dataset 
and the DTI dataset, respectively. These findings examining 
functional and structural networks are consistent with other 
in vivo studies ||2l, 1341 and with the pattern of brain injury 
at autopsy oni, 1221 in HIV infection. With the approach 
presented in this analysis, alterations in the brain can be 
detected in initial stages of injury and in the context of clini¬ 
cally meaningful information, such as host immune status and 
immune response (flow cytometry), immune mediators (plasma 
luminex) and cognitive function (neuropsychological tests). 
This approach optimizes the valuable information inherent 
in complex clinical datasets. Strategies for combining vari¬ 
ous sources of clinical information have promising potential 
for informing an understanding of disease mechanisms, for 
identification of new therapeutic targets and for discovery of 
biomarkers to assess risk and to evaluate response to treatment. 

VI. Related Work 

To the best of our knowledge, this paper is the first work 
exploring side information in the task of subgraph feature 
selection for graph classification. Our work is related to 
subgraph mining techniques and multi-view feature selection 
problems. We briefly discuss both of them. 

Mining subgraph patterns from graph data has been stud¬ 
ied extensively by many researchers. In general, a variety 
of filtering criteria are proposed. A typical evaluation cri¬ 
terion is frequency, which aims at searching for frequently 





























(a) (b) 

Fig. 7. Discriminative subgraph patterns that are associated with HIV, 
selected from the fMRI dataset. 


(a) (b) 

Fig. 8. Discriminative subgraph patterns that are associated with HIV, 
selected from the DTI dataset. 


appearing subgraph features in a graph dataset satisfying a 
prespecified min_sup value. Most of the frequent subgraph 
mining approaches are unsupervised. For example, Yan and 
Han developed a depth-first search algorithm: gSpan Et). 
This algorithm builds a lexicographic order among graphs, 
and maps each graph to an unique minimum DFS code 
as its canonical label. Based on this lexicographic order, 
gSpan adopts the depth-first search strategy to mine frequent 
connected subgraphs efficiently. Many other approaches for 
frequent subgraph mining have also been proposed, e.g., AGM 
Ca, FSG (m, MoFa 0, FFSM (13, and Gaston fOSl . 

Moreover, the problem of supervised subgraph mining has 
been studied in recent work which examines how to improve 
the efficiency of searching the discriminative subgraph patterns 
for graph classification. Yan et al. introduced two concepts 
structural leap search wd frequency-descending mining, and 
proposed LEAP which is one of the first work in 

discriminative subgraph mining. Thoma et al. proposed CORK 
which can yield a near-optimal solution using greedy feature 
selection 133 . Ranu and Singh proposed a scalable approach, 
called GraphSig, that is capable of mining discriminative sub¬ 
graphs with a low frequency threshold ITtI . Jin et al. proposed 
COM which takes into account the co-occurences of subgraph 
patterns, thereby facilitating the mining process ca. Jin et al. 


further proposed an evolutionary computation method, called 
GAIA, to mine discriminative subgraph patterns using a ran¬ 
domized searching strategy El. Our proposed criterion gSide 
can be combined with these efficient searching algorithms 
to speed up the process of mining discriminative subgraph 
patterns by substituting the G-test score in LEAP 1^ or 
the log ratio in COM {TEl and GAIA ITTl . etc. Zhu et al. 
designed a diversified discrimination score based on the log 
ratio which can reduce the overlap between selected features 
by considering the embedding overlaps in the graphs 133 . 
Similar idea can be integrated into gSide to improve feature 
diversity. 

There are some recent works on incorporating multi-view 
learning and feature selection. Tang et al. studied unsupervised 
multi-view feature selection by constraining that similar data 
instances from each view should have similar pseudo-class 
labels l30l. Cao et al. explored tensor product to bring different 
views together in a joint space and presents a dual method 
of tensor-based multi-view feature selection lH- Aggarwal et 
al. considered side information for text mining (Tl. However, 
these methods are limited in requiring a set of candidate 
features as input, and therefore are not directly applicable for 
graph data. Wu et al. considered the scenario where one object 
can be described by multiple graphs generated from different 
feature views and proposes an evaluation criterion to estimate 
the discriminative power and the redundancy of subgraph 
features across all views ED- In contrast, in this paper, we 
assume that one object can have other data representations of 
side views in addition to the primary graph view. 

In the context of graph data, the subgraph features are 
embedded within the complex graph structures and usually it 
is not feasible to enumerate the full set of features for a graph 
dataset before the feature selection. Actually, the number of 
subgraph features grows exponentially with the size of graphs. 
In this paper, we explore the side information from multiple 
views to effectively facilitate the procedure of discriminative 
subgraph mining. Our proposed feature selection for graph 
data is integrated to the subgraph mining process, which can 
efficiently prune the search space, thereby avoiding exhaustive 
enumeration of all subgraph features. 

VH. Conclusion and Future Work 

We presented an approach for selecting discriminative sub¬ 
graph features using multiple side views. This has impor¬ 
tant applications in neurological disorder diagnosis via brain 
networks. We show in this paper that by leveraging the 
information from multiple side views that are available along 
with the graph data, the proposed method gMSV can achieve 
very good performance on the problem of feature selection 
for graph classification, and the selected subgraph patterns are 
relevant to disease diagnosis. 

A potential extension to our method is to combine fMRI and 
DTI brain networks to find discriminative subgraph patterns 
in the sense of both functional and structural connections. 
Other extensions include better exploring weighted links in 
the multi-side-view setting. It is also interesting to have our 




model applied to other domains where one can find graph data 
and side information aligned with the graph. For example, in 
bioinformatics, chemical compounds can be represented by 
graphs based on their inherent molecular structures and are 
associated with properties such as drug repositioning, side 
effects, ontology annotations. Leveraging all these information 
to find out discriminative subgraph patterns can be transfor¬ 
mative for drug discovery. 
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